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Abstract 

It is well known that, as n tends to infinity, the probability of satisfiability for a random 2-SAT 
formula on n variables, where each clause occurs independently with probability a/2n, exhibits a sharp 
threshold at a = 1. We study a more general 2-SAT model in which each clause occurs independently 
but with probability ctijln where i £ {0, 1,2} is the number of positive literals in that clause. We 
generalize branching process arguments by Verhoeven(99) to determine the satisfiability threshold for 
this model in terms of the maximum eigenvalue of the branching matrix. 
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1 Introduction 



1.1 Background 

The /^-satisfiability (in short k-SAT) problem is a canonical constraint satisfaction problem in theoretical 
computer science. A /c-SAT formula is a conjunction of m clauses, each of which is a disjunction of length 
k chosen from n boolean variables and their negations. Given a fc-SAT formula a natural question is to 
find an assignment of n variables which satisfies the formula. The decision version of the problem is to 
determine whether there exists an assignment satisfying the formula. 

From the computational complexity perspective, the problem is well understood. The problem is NP- 
hard for k > 3 BCoo71l and linear time solvable for k = 2 MAPT82II . Much recent interest was devoted 
to the understanding of random fc-SAT formulas where each clause is chosen independently with the same 
probability and the expected number of clauses in the formula is an. This problem lies in the intersection 
of three different subjects — statistical physics, discrete mathematics and complexity theory. 

In statistical physics, the notion of 'phase transition' refers to a situation where systems undergo some 
abrupt behavioral change depending on some external control parameter such as temperature. In the context 
of random fc-SAT formulas the natural parameter is the density of the formula a, i.e., the ratio between the 
number of clauses to the number of variables. Much recent research is devoted to understanding the critical 
densities for random fc-SAT problems. The most important one is the critical density for satisfiability, 
i.e., the threshold at which a formula becomes from satisfiable with high probability to unsatisfiable with 
high probability [M MZ061 IAP04L Other thresholds involve the geometry of the solution space and the 
performance of various algorithms (e.g. see MART06I IMMW071 and references therein). 

The problem of 2-SAT is more amenable to analysis than fc-SAT for k > 3. This is closely related to 
the fact that 2-SAT can be solved in linear time and to a clear graph theoretic criteria which is equivalent 
to satisfiability. The threshold for 2-SAT is well known to be a = 1 ( se e HGoe921 lGoe96l ICR921 lFdiV92ll 
) and detailed information on the scaling window is given in jBBC + 0lll . For /c-SAT for k > 3 there are 
various bounds and conjectures on the critical threshold for satisfiability but the thresholds are not known 
rigorously grT99l IAP041 1 AchOOl IPZ021 . 

In the current paper we establish the threshold of a more general 2-SAT model where the probability 
of having a clause in the formula depends on the number of positive and negative variables in the clause. 
Our proof is based on branching process arguments. Branching process techniques have been used before 
to study the standard 2-SAT formulas in the unsatisfiability regime (a > 1), for example see | Ver99| . We 
generalize the arguments given in MVer99l in the 2-type branching process set-up to analyze the general 
2-SAT model. Our main contribution is in demonstrating that branching process arguments extend to a 
multi-type setup. A well accepted idea in studying random graphs and constraint satisfaction problems is 
that since "local" structure of the problems is tree-like, processes defined on trees play a key role in analyzing 
the problems. The classical example is the threshold for the existence of a "giant" component in random 
graphs where branching processes play a key role in the proof (see e.g MJLROOllBolOllO . Some more recent 
examples include LMdO 1 1 IWei06l IMWW08H . 

A seemingly closely related work is by Cooper et. al. MCFS02I where the threshold for random 2-SAT 
with given literal degree distribution is derived. We note that the two papers are incomparable, since ours is 
stated in terms of the distribution of clauses of different types while in MCFS02I the model is stated in terms 
of the degrees of the literals. While the distribution of clauses determines the distribution of the degrees of 
the literals, the converse does not hold. For example, a random 2-SAT formula with 2n positive-negative 
clauses has the same literal degree distribution as a uniform random formula with 2n clauses. It is obvious 
that while the former is always satisfiable, the latter is not satisfiable with high probability. 
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1.2 Definitions and statements of main results 



Let xi, X2, ■ ■ ■ , x n be n boolean variables . Calling the negation of x\ as Xj, these n boolean variables give 
us 2n literals {xi, X2, . . . , x n , xi, X2, • • • , x n }. The two literals Xj and Xj are called complementary to each 
other (xj = 1 iff Xj = 0) with the convention that Xj = Xj. We will call the n literals Xj, i = 1,2, ... ,n 
positive literals and their complementary literals Xi,i = 1,2, ... ,n negative literals. 

Given a literal u, var{u) denotes the corresponding variable, the notation naturally extending to a set 
of literals S by var(S) = {var(u) : u G S}. Two literals u and v are said to strongly distinct if u ^ v 
and ij, equivalently, if var(u) 7^ var(v). A 2-clause (which we will call simply a "clause" later) is a 
disjunction C = x V y of two strongly distinct literals. In this paper we will not allow x V x or x V x as 
valid clauses. A 2-SAT formula is a conjunction F = Ci A C2 A . . . A C m of 2-clauses C\, C2, . . . , C m . Let 
C = {Ci, C2, • • • , C m } be the collection of clauses corresponding to F. 

As usual in the boolean algebra, stands for the logical value FALSE, and 1 stands for the logical value 
TRUE. A 2-SAT formula F = F(x\, x%,..., x n ) is said to be satisfiable if there exists a truth assignment 
77 = (771 , 772 , ... j rj n ) G {0, l} n such that F (7/1, 772, • • • , = L The formula F is called SAT if the formula 
F is satisfiable and F is called UNSAT otherwise. 

In the standard model for a random 2-SAT formula we choose each of the possible 4(2) clauses indepen- 
dently with probability a/2n. In this paper we will study a more general model in which a 2-SAT formula 
F consists of a random subset C of clauses such that each clause appears in C independently and a clause 
having i positive literals is present in the formula with probability ai/2n, i = 0, 1, 2 for some constants 
cti > 0. Of course, taking ao = u\ = a>2 = a we retrieve the standard model. 

Let M be the branching matrix given by 



M-i 
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tt2 Ct\ 



(1) 



Note that though M is not symmetric in general, its eigenvalues are all real and given by \{a\ ± y / ao02). 
Let p = 5(0:1 + y^aoal) denote the largest eigenvalue of M. We show that p is a crucial parameter for 
satisfiability. In particular, our main result is the following theorem which establishes that the generalized 
2-SAT to model undergoes a phase transition from satisfiability to unsatisfiability at p = 1. 



Theorem 1 Let F be random 2-SAT formula under generalized model with parameter p. 

(a) If p < 1 or 0002 = then F is satisfiable with probability tending to one as n — > 00. 

(b) If p > 1 and ao a 2 > then F is unsatisfiable with probability tending to one as n — > 00. 

Remark 1 It is easy to see and well known that the satisfiability threshold for 2-SAT remains the same for 
variants of the model where the set of 2-clauses contains also clauses of the form x V y where x and y 
may not be strongly distinct. Similarly, the threshold remains the same if na clauses are chosen uniformly 
at random instead of choosing each clause independently with probability a /2n ( See the appendix A of 
kBBC + 0l\l ). The same reasoning applies to the more general 2-SAT model considered here. 



2 2-SAT and the implication digraph 

We will exploit the standard representation of a 2-SAT formula as a directed graph (see jBBC + 0ll for 
example), called the implication digraph associated with the 2-SAT formula. This graph has 2n vertices, 
labelled by the 2n literals. If the clause (uVv) is present in the 2-SAT formula then we draw the two directed 
edges u — ► v and v — > u. The directed edges can be thought of as logical implications since if there is a 
directed edge from u — > v and u = 1, then for the formula to be satisfiable, it is necessary to have v = 1. 
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By a directed path (from u to v), we mean a sequence of vertices uq = u, u±,U2, ■ ■ ■ , = v such that 
there is a directed edge from U{ — > Ui + \ for all i = 0, 1, . . . , k — 1. The length of this directed path is k. 
A contradictory cycle is a union of two (not necessarily disjoint) directed paths - one starts from a literal u 
and ends at its compliment u , the other starts from u and ends at u. 

The following lemma connects the concept of satisfiability of the 2-SAT problem to the existence of 
contradictory cycle in the implication digraph. For a proof, see jBBC + 01 1. 

Lemma 1 A 2-SAT formula is satisfiable iff its implication diagraph contains no contradictory cycle. 

3 Proof of Theorem [I] part (a) 

The proof has some resemblance with the first moment arguments given in Chvatal and Reed IICR92H . In fact 
our 'hooked chain' is same as what they called 'bicycle'. The extension to the more general case considered 
here uses a recursive argument which allows us to deal with the multi-parameter general model. 

Definition 1 Suppose there exists strongly distinct literals y±, yi, ■ ■ ■ , y s and u,v £ {yi,V2, ■ ■ ■ , Vs, 2/i> 
2/2) • • • j Vs} such that (u V yi), (f/i V 2/2)) ■ ■ ■ , (f/s-l V y s ), (y s Vu) G C or equivalently, there exists u — > 
j/i — ^ j/2 — ^ • • • — ^ 2/s — v i« the implication digraph corresponding to the 2-SAT formula. We call this 
sequence of literals a hooked chain (of length s + 1). 

Lemma 2 If a 2-SAT formula is unsatisfiable then its implication digraph contains a hooked chain of length 
> 3. 

Proof. Suppose a 2-SAT formula is unsatisfiable. By lemma[U we have a contradictory cycle in the impli- 
cation digraph, say no — > u\ — > 112 —>•••—> ui = uq — > n^ +1 — ► ti/ +2 —>■••—► Uk = u$. The cycle 
has at least one directed path from a literal to its complement. Choosing one that minimizes the length we 
get an implication chain formed by a sequence of the literals Uh — > ^(h+i)mod fc ~* u (h+2)mod k ~ * " ' — * 
U{h+t)madk = u h so that U( h+1)modk ,u (h+2 ) modk ,- ■ ■ ,W(ft+t)modft are strongly distinct. Find the largest 
s > t such that u {h+1)mod k ,u {h+2)modk , ■■■ , u {h+s)mod k are strongly distinct. Let v be the element pointed 

to by u {h+s)mod k in the cycle. Then clearly u h u {h+ i) mod k -> u {h +2)mod > u { h+s)mod k ~> v is 

a hooked chain of length s + 1. Since there can be no edge between a literal w and its complement w, we 
must have t > 2 and therefore, s > 2. □ 



where T s _ 1 (orT s _ 1 ) is the expected number of directed paths of length (s — 1) started from x\ (or X\) 
consisting of strongly distinct literals with Tq = Tq = 1 and C = [max(ao, ai, ct2)] 2 - 

Proof. Let H s be the number of hooked chain of length (s + 1) in the implication digraph for 2-SAT and T s 
be the number of directed paths of s strongly distinct literals in the same digraph. From Lemma|2l 



Lemma 3 




(2) 



s=2 



F(F is unsatisfiable) < P(3 a hooked chain of length (s + 1) for some s > 2) 



(3) 



n 




(4) 



s=2 
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^gH^) (5) 



.s- 2 



(6) 



n 

s=2 



Step © follows from simple union bound and Markov inequality. Inequality (f5]) can be explained as fol- 
lowing: 

H * = YTy lr . ,y a Eu,«6{ltt,- ,54 r (( fi V ^' ^ V y2 )' • • • . V U) € C). 

Here J2* means the sum is taken over all possible set of strongly distinct literals of size s. Observe that 
for a fixed choice of strongly distinct literals there are 2s choices for each u and v and each clause occurs 
with probability at most max(ao, a±, 02) /2n. Now taking expectation and using independence between 
clauses, we have 

* 

E(H S )= J2 P((yiVy 2 ),--- ,(v.-iVy,) G C)P((« V yi) £C)F((y s Vv) £ C) 

vi,— ,Vs u,ve{yi,--- ,y s ,y\,--- ,y 3 } 

C(2s) 2 * 

-~(2nW p ((yi v y2),feVy 3 ),--- ,(y s -iVy s ) e C) 
yi,-,y s 

- ^ { r,, 

[2n) z 

Noting that the quantities T^_ 1 or T s l 1 do not depend on x\, step © follows. 



□ 



Lemma 4 Write T k = (T+, T~) T . Then 



T s _i < M S_1 1 (7) 

where M « defined in (Q]) a«J 1 = (1, 1) T . 

Proof. For a literal u strongly distinct from x\, let J u denote the number of directed paths of length (s — 2) 
starting from u consisting of strongly distinct literals and not involving the variable x\. Then 



u:u literals 

var(u)^var(xi ) 



xP((xi Vu) GC). 



literals 

^ar(M)^:i)ar(a;i ) 

The last step follows from the independence of clauses. 

Simple coupling argument yields E[J U ] < T" s _ 2 or T s 1 2 depending whether the literal it is positive or 
negative. Combining the above facts, we get the following recursive inequality, 

T+_! < nT^ 2 F( Xl V x 2 ) + nT-_ 2 ¥(xi V x 2 ) 

< f Tt_ 2 + f T- 2 (8) 
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and similarly, 

T^^T+s + yT^ (9) 

Now the above two equations can be written in a more compact way as follows 

T s _i < MT S _ 2 (10) 

Iterating GSI>, we get T s _i < M S-1 T = M S_1 1. □ 

Proof of theorem [T] part (a). We are now ready to finish the proof of part (a) of Theorem [T] If 0,^012 = 
then either all zero or all one assignment always satisfies it. So, take ao«2 > 0. Then M is semisimple (i.e. 
similar to a diagonal matrix). 

By lemmalU T+_ ln + T~_ ln < 1 T M S_1 1 < Bp s ~ l for some constant B. The last inequality holds 
since we assume M is semisimple. Plugging it in (f2]), we finally have 

K n 

P(F is unsatisfiable) < — s 2 p s ~ 1 for some constant K > 

n s=2 

< 0(n _1 ) since p < 1. (11) 

□ 



4 The Exploration Process 

Observe that when p > 1, we need to find a contradictory cycle in the implication digraph of the random 
2-SAT formula with high probability. In order to prove this, we will show that starting from any fixed vertex 
there is a constant probability that it implies a large number of literals in the digraph, meaning that there are 
directed paths to a large number of vertices from the fixed vertex. To achieve this, we explore the digraph 
dynamically starting from a fixed literal x under certain rules and keep track of variables that are implied by 
x at each step. We call this the exploration process which is defined next. 

Definition and Notations. Given a realization of the 2-SAT formula and an arbitrarily fixed literal x, we 
will consider an exploration process in its implication digraph starting from x. 

• This process describes the evolution of two sets of literals which will be called the exposed set and 
the active set. 

• A literal is said to be alive in a particular step of the process if it is strongly distinct from those in the 
exposed set and from those in the active set at that step. 

• We maintain two stacks for the literals in active set, one for positive literals and another for negative 
ones. 

• At each step we pop-up a literal (call it the current literal) from one of the two stacks of active set, 
depending on some event to be described later and expose it. It means that we look for all the literals 
that are alive at that time and to which there is a directed edge from the current literal. 

• We then put those new literals in the stacks of the active set ( positive or negative) in some predeter- 
mined order and throw the current literal in the exposed set. 

• We go on repeating this procedure until the stack of the active literals becomes empty and the process 
stops. 
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• Mathematically, let E t and A t denote respectively the set of exposed and active set of literals at time 
t. Also let Ut = {var(u) : var(u) G" var{E t ) U var(A t )} be the set of alive variables at time t. Set 
Eq = 0, Aq = {x}. If A t is non-empty and the literal I G A t is exposed from the stack, then we have 
the following updates at time t + 1, 

A t +i = (A t \{l})U{u : u literal s.t. var(u) G Ut and the clause (IVu) is present}, E t +i = E t U{l}. 
If A t is empty, then so is A t +\ and E t +\ will be same as E t . 

Note that during the evolution of the process, each clause is examined only once. Also every literal in 
Ut(^ U E t ) can be reached from x via a directed path (consisting of strongly distinct literals). 

For a subset S of literals, we can partition it as S = S + U S~ where S + (resp. S~ ) is the set of all 
positive ( resp. negative) literals of S. Let u t ,af,a^ be the shorthand for \Ut\, \Af\ and \A^\ where | • | 
means the size of a set. Set at := \A t \ = af + . 

Distribution of the process. The stochastic description of the evolution of the process (ut,af ,aj), < 
t < n for a random 2-SAT formula on n variables can be summarized in the next lemma whose proof is 
immediate. 



Lemma 5 Define a triangular array of independent Bernoulli random variables. 

wf ~ Ber(a /2n); xf \ ~ Ber{ai/2n); zf ~ Ber(a 2 /2n), 1 < i < n, < t < n. 
Let At 7^ 0. Given H(t), the history up to t and that the current literal at time t is positive, we have 

u t - u t+ i = ^{(W^ + Xf 1 ) A 1} = Bin(u t , a /2n + a x [2n - a ai/4n 2 ) 



i=l 



af +1 - 4 = -1 + X i ] = - 1 + Bin (ut, at/2n), a w - = J wf = -1 + Bin(u t , a /2n) 



:f = -1 + Bin{u t , ai/2n), aj +1 - = ™r® 
i=i i=i 

Similarly, given Hit) and conditional on the event that the current literal at time t is negative, we have 

u t 

u t - u t+ i = + Zf) A 1} = Bin(u t , ai /2n + a 2 /2n - ai a 2 /4n 2 ) 

i=l 



ut ut 

t+i ~ 4 = -1 + Yl Z i ] = _1 + B H u u a 2 /2n), a~ +1 -at =J2 = ~ l + Bin (uu a x /2n). 
i=i i=i 



Definition 2 For the rest of the paper, we fix T = [y/n\. Let r := sup{t < T : ut > uq — 2aT} 
where a = m&x(ao,ai,a 2 ). In words, r is the last time before T such that the decrease in the number of 
unexposed variables is at most 2aT. 

We define a round of the exploration process as follows. We fix a subset S of variables of size N > 
(1 — 5/2)n for some small 5 > such that (1 — S)p > 1 and a starting literal, say x G S . We first 
run the exploration process from x on the implication digraph restricted to literals from S up to time r. If 
t < T, we stop. Otherwise, we delete all the variables in var((Ex U At) \ var(x)) from S to get a new 
set of variables S' C S. By the definition of t, \S'\ > N — 2aT. Now we again run another independent 
exploration process starting from x up to time r but on the digraph restricted to literals from S'. We again 
have the stopping rule ofr < T. 
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Lemma 6 The (random) set of clauses examined during the evolution of an exploration process up to time 
T is disjoint from the set {(u\/ v) : u,v £ At}- Further, the clauses examined during the evolution of the 
second exploration process of a round are distinct from the clauses in the set {(u V v) : u,v G At} U T> 
where At (resp. T>) is the active set at time T (resp. the set of clauses examined during the evolution ) of 
the first exploration process. 

Proof: The first statement of the lemma follows from the easy observation that if u G At, then, from the 
very definition of the exploration process, the clause (u V w) is not examined up to time T for all literals w 
such that w G" Et- 

For the second statement, note that any problematic clause should include literal x or x (x = starting 
vertex) . Now all the clauses involving var(x) which are examined during the first exploration process of a 
round must have the form (x V y) where literal y is such that var(y) / var(x). But the clauses involving 
var(x) scanned by second exploration process of the round are all of the form (x V y) where literal y is such 
that var(y) / var(x) and there can be no clause from the set {(u V v) : u, v G At} which contains either 
x or x. □ 

Corollary 1 Given whether each of the clauses in {(u V v) : u,v G At} U T> is present in C or not , the 
distribution of the evolution the second exploration process only depends on number of the variables with 
which the second process starts. 

4.1 Proof of the Theorem part (b) when aC s are all equal 

Before tackling the general situation we pause for a moment to give a quick sketch, after MVer991 . of unsat- 
isfiability part of the phase transition for the standard 2-SAT model. This will serve as a prelude to the proof 
for the general case. 

Let a = ao = a± = a.2 > 1. Then p = a. In this special case, we slightly modify our exploration 
process by demanding that we will always choose the current literal from the set of active literals uniformly. 
Thus at each time t > 1, given its size at, A t is uniformly random over all the literals except x and x, the 
starting vertex and its complement. 

Since the probabilities for the clauses to be present are all equal, the distribution of the exploration 
process (u f , at) simplifies. Given H(t), the history up to time t and at > 0, 

u t - u t+ i =Bin(u t ,2p n - pi), a t+ i - a t = -1 + Bin(2uf,p n ), where p n = a/2n. (12) 

Note that each of the random variables (u t — «t+i) is stochastically dominated by Bin(ra, 2p n ) which has 
mean a. Thus using concentration of the Binomial distribution it is easy to see that for time T = [y/n\, the 
event {r < T} = {Ylt=i( u t-i ~ u t) > 2aT} occurs with probability at most Aexp(-cT) where c > 0. 
See Lemma|7]in Section 1421 for a proof of a more general fact. 

Let 5 > be as given in Definition El If u > (1 - 5/2)n, then {r = T} C {u T > (1 - S)n}. When 
t = T, the process {at, t > 1} behaves like a random walk with positive drift on nonnegative integers with 
as the absorbing state and hence 

3C > such that ¥(a T > CT, r = T) > ( for some constant ( > 0, independent of n. (13) 

If both u and u are in At for some literal u, we have a directed path from the starting vertex to its 
complement using the literals in Et U At- Else, each pair of literals u, v G At are strongly distinct. There 
are edges u — > v and v — > u in the digraph if the clause (u V v) is present in the formula. If at least one 
of these ( ) many clauses is present, then we again have a directed path from the starting vertex to its 
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complement using the literals in Et U At- Let D be the event that there exists a directed path from the 
starting vertex of the exploration process to its complement in Et U At- Therefore, by Lemma[6l 

¥(D\ a T > CT,t = T) > 1 - (1 - a/2n)^) > p, where p > is a constant, independent of n. 
which implies that 

F(D, t = T)> F(D\ a T >CT,t = T)F{a T >CT,t = T)> P £> 0. 

Now, by Corollary [T] we can say that after a round, the probability that there is no termination and there 
exists a contradictory cycle in the variables visited during the round is at least p 2 ( 2 . 

We continue with another round of exploration process in the deleted graph containing only unvisited 
variables. We repeat this process until a round stops due to the stopping rule. If each of the successive rounds 
does not terminate, we have 0(y / n) rounds of the exploration processes before the event {u t < (1 — 5/2)n} 
occurs. It is easy to see the clauses examined in different rounds of exploration process are all distinct and 
hence the rounds are independent. 

Thus, the probability that we get no contradictory cycle in all the rounds is at most 

P(no contradictory cycle and no round stops) + P( one of the rounds stops) 

< (1 - p\ 2 )0(v^) + (!_(!_ Aexp(-cT)) 2e ^) 

< Const x exp {—By/n) for some B > 0. 

□ 

Remark 2 Instead of taking T = [y/n\ as in the proof if we choose T = 0(n) suitably, then (1131 ) yields that 
aT > O(n) with probability at least r with some r > 0. Thus for any literal y ^ x,x, we get P(y £ At) > p 
for some p > and for all n large enough. So, the probability that there is a directed path from x\ to x% 
is at least p. The same holds true for directed path from X2 to x\. These are monotonic events. So, by 
the FKG inequality HFKG71\I , they occur simultaneously with probability greater than or equal to p 2 . Thus 
the chance that there exists a directed path from x\ to x\ is at least p 2 . Again applying FKG, we have a 
contradictory cycle with probability at least p 4 . Now appealing to Friedgut's theorem on sharp threshold 
HFri99\l . we can conclude that the formula is UNSAT with probability tending to one as n — > oo. 

4.2 Associated 2-type Branching Process 

Now we are back to the general case. Given an exploration process on a subgraph of the implication digraph 
consisting of N = 6(n) many variables starting from any fixed literal, our goal is to couple it with a 
suitable 2-type supercritical branching process up to time T = [y/n] on a set of high probability. Assume 
N > (1 - 5/2)n where 5 > is such that (1 - S)p > 1. 

Thus on the set where {u t > N — 2aT, Vi < T}, for large enough n, Bin(u t , ai/2n) stochastically 
dominates Bin((l — 5)n,a>i/2n) for all time t < T. Next we are going to prove that this event happens 
with high probability. 

Lemma 7 Let T, 5 be as above and a = max(ao, a±, 02)- Then P(r < T) < 2exp(— aT/2). Therefore, 
P( 1H > (1 - 8)n Vi < T) > 1 - 2exp(-aT/2) for sufficiently large n. 

Proof. Since ut is decreasing in t and N — 2aT > (1 — 5)n for sufficiently large n, it is enough to prove 
that 

P(r < T) < 2exp(-aT/2). 
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Note that uq = N. Clearly, u t -\ — ut is conditionally independent of uq — u\, u\ — U2, ■ ■ ■ , ut-2 — ut-i 
given u t -\ and the type the current literal at time t and is stochastically dominated by Bin(2iV, a/2n) 
irrespective of the conditioning event . Therefore, the distribution of no — ut is stochastically dominated by 
Bin(2iVT, a/2n). By Bernstein's inequality, 

P(Bin(2iVr,a/2n) > 2aT) < 2exp(-aT/2). 

Therefore, 



P( r < T) = P( u T < N - 2aT) = P(u - u T > 2aT) < 2exp(-<xT/2). 

Lemma 8 There exist bounded distributions F with mean rxii , < i < 2 such that 

1. For all sufficiently large n 

P(Bin(n(l - 5), a;/2n) = k) > F(X = k\X ~ F 4 ) Vfc > 1. 

2. Tjf Mo is the branching matrix given by 

M 

then po > 1 where po ^ ^ e maximum eigenvalue o/Mq. 



□ 



mi mo 
m,2 mi 



(14) 



Proof. Fix some /3 G (0, 1) so that (1 - 5)(1 - 0)p > 1. Let 7i = (1 - 5)a»/2 for i = 0, 1, 2. Find c large 
enough so that 

c 

^ fc(l - /?/2) exp(- 7i )7?/fc! > (1 - /3) 7 i » = 0, 1, 2 
fc=i 

For each < i < 2, let us now define a truncated (and reweighed) Poisson distribution which takes the 
value k with probability (1 — 0/2)exp(—ji)jf/kl for 1 < k < c and otherwise. Call this distribution 
Fj. By the choice of c, its mean rm is greater than (1 — /3)7». Poissonian convergence says Bin(re(l — 

(5), oti/2n) — > Poisson (7$) and conclusion 1 of the lemma follows. 

Also, po = mi + - v /m ?Ti 2 > (1 — <5)(1 - > 1. □ 

Definition 3 Le? us have a supercritical 2-type branching process, which we will call an F-branching pro- 
cess, with offspring distributions as follows 

1st type -> (1st type, 2nd type) : (Fi,F ) 2nd type -> (1st type, 2nd type) : (F 2 , Fx) (15) 

indep indep 

Next we define a new process X(i) = (Xi(t), X 2 (t)) by sequentially traversing the Galton-Watson tree 
of the F -Branching process. We fix a suitable order among the types of nodes of the tree and moreover we 
always prefer to visit a node of type 1 to a node of type II. Then we traverse the tree sequentially and at each 
step we expand the tree by including all the children of the node we visit. Let us denote number of unvisited 
or unexplored children of type i in the tree traversed up to time t by Xi(t). 

Lemma 9 There exists a coupling such that (af , aj) > ~K(t)for allt < r and n large enough. 
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Proof. Fix n sufficiently large. If the starting vertex of exploration process is of positive type, we initiate 
the branching process with one individual of type I. Similarly for the other case. We run in parallel the 
exploration process where the choice of the type of the current literal at time t depends on the type of the 
visited node at time t. It can be done because, if t < r, we can always simultaneously choose our random 
variables in such a way (by Lemmas [5] and [8]) that for every step, the number of active literals generated of 
each type is no less than the number of unvisited nodes of the corresponding type in the tree grown up to 
that step. If r < t < T or if we have no unvisited child left in the tree then we choose the current literal 
from the active set in some fixed predetermined procedure. □ 
Next we are going to find a lower bound on the total number of unvisited children after T steps of the 
above process. 



Lemma 10 Suppose X(t) be as in Definition\3\with X(0) = (1, 0) or (0, 1). Then 3C > 0, rj > such that 
Ppfi(T) + X 2 (T) > CT) > 7]. 



Proof. Though a proof of the above lemma can be found implicitly in MKS671 . we here present it for sake of 
completeness. Recall that the F-branching process is supercritical as po, the maximum eigenvalue of Mo, is 
strictly greater than 1. 

If we assume a\ > 0, trivially, this process is positive regular and non-singular. Thus, by a well- 
known result (see MHar0210 on supercritical multitype branching process, its extinction probability is given 
by < q = (51,92) < 1 where qi is the probability that the process becomes extinct starting with one 
object of type i. 

Note if a± = 0, we no longer have the positive regularity. In that case, though the above theorem can 
not be directly applied, we can argue as follows to get the same conclusion. If the process starts with only 
one individual of type i, the corresponding branching process can be viewed as a single type supercritical 
branching process ( made of the individuals of type i only) if we observe the process only at the even number 
of steps. So, the probability that it eventually dies out, which is nothing but qi, is strictly less than one. 

Let us denote e\ = (1, 0), e 2 = (0, 1). Instead of looking at X(i) which has (0, 0) as an absorbing state, 
we will consider a new chain X(t) starting from X(0) = X(0) which is supported on entire 1? . Given 



We can couple X(t) and X(i) together so that X(i) = X(t) until X(t) reaches (0, 0). 

Let (a, b) be a normalized eigenvector of Mo corresponding to eigenvalue po so that a 2 + b 2 = 1. Since 
a ,a 2 > Owe have both a > and b > 0. LetZ(t) := aX 1 (t) + bX 2 (t) and Z(t) := aXi(t) + bX 2 (t). Let 
T t := a(X(0), X(l), • • • , X(t)) and AZ(t) := Z(t + 1) - Z(t). Since F;'s are bounded, so are AZ(t)'s. 
Then 



X(0),X(1),.-- 



X(t), define 




(16) 



E(AZ(t)\F t ) = 



(po 
{po 



l)a 
1)6 



if Xt{t) > 



o.w. 



> fi := (po — 1) min(a, b) > 



(17) 



Now we have, 




< 



E^E^O-ECAZWI-Fi)) 



0(T~ r ) 



p 2 T 2 /4 
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In the last line we use orthogonality of the increments, boundedness of AZ(t) and Chebyshev inequality. 
Therefore we can conclude, 

F(X 1 (T) + X 2 (T) > /iT/2|X(0) = e<) > F(Z(T) > /xT/2|X(0) = e;) 

> P(Z(T) > /iT/2, X(t) / V < t < T|X(0) = e;) 

> P(X(t) / V t > 0|X(0) = a) - F(Z(T) < /xT/2|X(0) = a) 

> (l-^)-O(T- 1 ). 

The second inequality uses the fact that Z(t) = Z(t) until X(£) reaches (0, 0). 

□ 

5 Proof of the Theorem d part (b) 

Lemma 11 In one round of exploration process on a subgraph involving N > (1 — 5/2)n many variables 
the probability that ( i) there is no termination due to the stopping rule and ( ii) there exists a contradictory 
cycle using the variables visited through the round is at least p 2 ( 2 for some p, ( > 0. 

Proof. From Lemma [9] and [lOl we get 

F(a T > CT, t = T) > P(X X (T) + X 2 (T) >CT,t = T) 

> F(Xi(T) + X 2 (T) > CT) - P(r < T) 

> n - 2exp(-aT/2)) > ( > 0. 

If both u and u G At for some literal u, we have a directed path from the starting vertex to its complement 
using the literals in Et U At- Otherwise, for each pair of literals u, v E At, which are strongly distinct, 
there are edges u — ► v and it — ► u in the digraph if the clause (u V v) is present in the formula. If at least 
one of these ( C 2 T ) many clauses is present, then we again have a directed path from starting vertex to its 
complement using the literals in Et U At- 

Case I: a\ > 0. Let a m - m = min(ao> (Xi,a 2 ) > 0. Let D be the event that 3 a directed path from the 
starting vertex of the exploration process to its complement in Et U At- Then, by lemma[6l 

F(D\ a T > CT and r = T) > 1 - (1 - a min /2n)( C 2 T ) > p > 0. 

Case II: ai = 0. Now a m j n = and we can not make the above statement. But then, instead of looking at 
all (u V v) clauses where u, v are strongly distinct clauses belonging to At, we only consider such clauses 
where u and v have same parity. Since we have at least 2 ( c T J 2 ) many clauses of that type, we have, similarly 
to case I, 

F(D\ a T > CT and r = T) > 1 - (1 - a /In) 2 ^ 2 ) > p > 0. 

where a' = min(ao, a 2 ) > 0. 
Therefore, 

F(D, t = T)> F(D\ a T > CT and r = T)P( a r > CT and r = T) > K- 
The lemma is now immediate from Corollary Q] □ 
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Remark 3 From the proof of the above lemma, we have seen that for large n with probability at least r 
for some r > we have a directed path in the implication digraph from any literal u to its complement 
u. Invoking FKG, we can say that we can find a contradictory cycle with probability at least r 2 > 0. But 
note that we do not have a ready-made theorem like Friedgut 's sharp threshold result for generalized 2-SAT 
model. Though we believe that tweaking the lemma 4 of ftHM06\l may help, we will not pursue that. Instead 
we take a different route to bypass the problem. 

Proof of the Theorem [l] part (b) We now show how to bootstrap this positive probability event to an event 
with high probability. 

Initially we run a round of exploration process on the entire set of variables starting from x±. If the 
process does not terminate after the first round, out of at least n — 4aT — 1 many unvisited variables, we 
pick up an arbitrary one and run another round of exploration process in the deleted graph starting from it. 
We repeat this process 5yfn/Qa < 5n/2(4aT + 1) many times, provided that we do not have to stop before, 
each time discarding previously visited variables to achieve independence among the different rounds. We 
thus ensure that in each run of exploration process, we have at least (1 — S/2)n many variables to start with. 

We conclude that the probability that we get no contradictory cycle in all the rounds is at most 

P(no contradictory cycle and no round stops) + P( one of the rounds stops) 

< (i _ p 2^yV*/9<* + (i_(i_ 2exp(-aT/2)) 25 vW9 Q ) 

< Const x exp (—By/n) for some B > 0. 

This concludes the proof. □ 
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